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psj . Abstract 

The model of a side information "vending machine" (VM) accounts for scenarios in which the 
measurement of side information sequences can be controlled via the selection of cost-constrained 
actions. In this paper, the three-node cascade source coding problem is studied under the assumption 
that a side information VM is available and the intermediate and/or at the end node of the cascade. A 
single-letter characterization of the achievable trade-off among the transmission rates, the distortions in 
the reconstructions at the intermediate and at the end node, and the cost for acquiring the side information 
is derived for a number of relevant special cases. It is shown that a joint design of the description of 
the source and of the control signals used to guide the selection of the actions at downstream nodes 
Ov is generally necessary for an efficient use of the available communication links. In particular, for all 

the considered models, layered coding strategies prove to be optimal, whereby the base layer fulfills 
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two network objectives: determining the actions of downstream nodes and simultaneously providing a 



o 

(N- coarse description of the source. Design of the optimal coding strategy is shown via examples to depend 



on both the network topology and the action costs. Examples also illustrate the involved performance 
trade-offs across the network. 
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I. Introduction 



The concept of a side information "vending machine" (VM) was introduced in Q~| for a point- 
to-point model, in order to account for source coding scenarios in which acquiring the side 
information at the receiver entails some cost and thus should be done efficiently. In this class 
of models, the quality of the side information Y can be controlled at the decoder by selecting 
an action A that affects the effective channel between the source X and the side information Y 
through a conditional distribution p Y \x,A(y\x, a). Each action A is associated with a cost, and 
the problem is that of characterizing the available trade-offs between rate, distortion and action 
cost. 

Extending the point-to-point set-up, cascade models provide baseline scenarios in which to 
study fundamental aspects of communication in multi-hop networks, which are central to the 
operation of, e.g., sensor or computer networks (see Fig. [Q). Standard information-theoretic 
models for cascade scenarios assume the availability of given side information sequences at 
the nodes (see e.g., 0-[4J). In this paper, instead, we account for the cost of acquiring the 
side information by introducing side information VMs at an intermediate node and/ or at the 
final destination of a cascade model. As an example of the applications of interest, consider the 
computer network of Fig. Q3 where the intermediate and end nodes can obtain side information 
from remote data bases, but only at the cost of investing system resources such as time or 
bandwidth. Another example is a sensor network in which acquiring measurements entails an 
energy cost. 



Figure 1. A multi-hop computer network in which intermediate and end nodes can access side information by interrogating 
remote data bases via cost-constrained actions. 
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As shown in Q]| for a point-to-point system, the optimal operation of a VM at the decoder 
requires taking actions that are guided by the message received from the encoder. This implies 
the exchange of an explicit control signal embedded in the message communicated to the decoder 
that instructs the latter on how to operate the VM. Generalizing to the cascade models under 
study, a key issue to be tackled in this work is the design of communication strategies that strike 
the right balance between control signaling and source compression across the two hops. 

A. Related Work 

As mentioned, the original paper [1] considered a point-to-point system with a single encoder 
and a single decoder. Various works have extended the results in [[TJ to multi-terminal models. 
Specifically, J5]|, [6]| considered a set-up analogous to the Heegard-Berger problem Q, flU, in 
which the side information may or may not be available at the decoder. The more general case in 
which both decoders have access to the same vending machine, and either the side information 
produced by the vending machine at the two decoders satisfy a degradedness condition, or 
lossless source reconstructions are required at the decoders is solved in [5J. In [9], a distributed 
source coding setting that extends ifTOll to the case of a decoder with a side information VM is 
investigated, along with a cascade source coding model to be discussed below. Finally, in IfTTIl . 
a related problem is considered in which the sequence to be compressed is dependent on the 
actions taken by a separate encoder. 

The problem of characterizing the rate-distortion region for cascade source coding models, 
even with conventional side information sequences (i.e., without VMs as in Fig. [2]) at Node 2 
and Node 3, is generally open. We refer to [0 and references therein for a review of the state 
of the art on the cascade problem and to [3 J for the cascade-broadcast problem. 

In this work, we focus on the cascade source coding problem with side information VMs. The 
basic cascade source coding model consists of three nodes arranged so that Node 1 communicates 
with Node 2 and Node 2 to Node 3 over finite-rate links, as illustrated for a computer network 
scenario in Fig. Q] and schematically in Fig. (2]-(a). Both Node 2 and Node 3 wish to reconstruct 
a, generally lossy, version of source X and have access to different side information sequences. 
An extension of the cascade model is the cascade-broadcast model of Fig. EKb), m which an 
additional "broadcast" link of rate Rt exists that is received by both Node 2 and Node 3. 

Two specific instances of the models in Fig. [2] for which a characterization of the rate-distortion 
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Figure 2. (a) Cascade source coding problem and (6) cascade-broadcast source coding problem. 



performance has been found are the settings considered in flU and that in lfT2l . which we briefly 
review here for their relevance to the present work. In [4], the cascade model in Fig. [2ta) was 
considered for the special case in which the side information Y measured at Node 2 is also 
available at Node 1 (i.e., X = (X, Y)) and we have the Markov chain X — Y — Z so that the 
side information at Node 3 is degraded with respect to that of Node 2. Instead, in [12], the 
cascade-broadcast model in Fig. [2^b) was considered for the special case in which either rate Rb 
or Ri is zero, and the reconstructions at Node 1 and Node 2 are constrained to be retrievable 
also at the encoder in the sense of the Common Reconstruction (CR) introduced in lfT3l (see 
below for a rigorous definition). 

B. Contributions 

In this paper, we investigate the source coding models of Fig. [2] by assuming that some of 
the side information sequences can be affected by the actions taken by the corresponding nodes 
via VMs. The main contributions are as follows. 

• Cascade source coding problem with VM at Node 3 (Fig. [3]): In Sec. III-Bl we derive the 
achievable rate-distortion-cost trade-offs for the set-up in Fig. [3] in which a side information 
VM exists at Node 3, while the side information Y is known at both Node 1 and Node 2 
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and satisfies the Markov chain X — Y — Z. This characterization extends the result of 01 
discussed above to a model with a VM at Node 3. We remark that in [9], the rate-distortion- 
cost characterization for the model in Fig. [3] was obtained, but under the assumption that 
the side information at Node 3 be available in a causal fashion in the sense of |[T4ll ; 

• Cascade-broadcast source coding problem with VM at Node 2 and Node 3, lossless com- 
pression (Fig. H}: In Sec. IIII-BL we study the cascade-broadcast model in Fig. |4] in which 
a VM exists at both Node 2 and Node 3. In order to enable the action to be taken by both 
Node 2 and Node 3, we assume that the information about which action should be taken 
by Node 2 and Node 3 is sent by Node 1 on the broadcast link of rate Rb. Under the 
constraint of lossless reconstruction at Node 2 and Node 3, we obtain a characterization of 
the rate-cost performance. This conclusion generalizes the result in [5] discussed above to 
the case in which the rate R\ and/or R 2 are non-zero; 

• Cascade-broadcast source coding problem with VM at Node 2 and Node 3, lossy com- 
pression with CR constraint (Fig. H]): In Sec. IIII-Dl we tackle the problem in Fig. |4] but 
under the more general requirement of lossy reconstruction. Conclusive results are obtained 
under the additional constraints that the side information at Node 3 is degraded and that 
the source reconstructions at Node 2 and Node 3 can be recovered with arbitrarily small 
error probability at Node 1. This is referred to as the CR constraint following [fT3l . and 
is of relevance in applications in which the data being sent is of sensitive nature and 
unknown distortions in the receivers' reconstructions are not acceptable (see ffT3l for further 
discussion). This characterization extends the result of [12J mentioned above to the set-up 
with a side information VM, and also in that both rates Ri and Rb are allowed to be 
non-zero; 

• Adaptive actions: Finally, we revisit the results above by allowing the decoders to select their 
actions in an adaptive way, based not only on the received messages but also on the previous 
samples of the side information extending [15]. Note that the effect of adaptive actions on 
rate-distortion-cost region was open even for simple point-to-point communication channel 
with decoder side non-causal side information VM until recently, when [15J has shown that 
adaptive action does not decrease the rate-distortion-cost region of point-to-point system. 
In this paper we have extended this result to the multi-terminal framework and we conclude 
that, in all of the considered examples, where applicable, adaptive selection of the actions 
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Figure 3. Cascade source coding problem with a side information "vending machine" at Node 3. 



does not improve the achievable rate-distortion-cost trade-offs. 
Our results extends to multi-hop scenarios the conclusion in [1J that a joint representation of data 
and control messages enables an efficient use of the available communication links. In particular, 
layered coding strategies prove to be optimal for all the considered models, in which, the base 
layer fulfills two objectives: determining the actions of downstream nodes and simultaneously 
providing a coarse description of the source. Moreover, the examples provided in the paper 
demonstrate the dependence of the optimal coding design on network topology action costs. 

Throughout the paper, we closely follow the notation in [12J. In particular, a random variable 
is denoted by an upper case letter (e.g., X, Y, Z) and its realization is denoted by a lower 
case letter (e.g., x,y,z). The shorthand notation X n is used to denote the tuple (or the column 
vector) of random variables (X 1; . . . ,X n ), and x n is used to denote a realization. The notation 
X n ~ p(x n ) indicates that p(x n ) is the probability mass function (pmf) of the random vector 
X n . Similarly, y n |{X n = x n } ~ p(y n \x n ) indicates that p(y n \x n ) is the conditional pmf of Y n 
given {X n = x n }. We say that X — Y — Z form a Markov chain if p(x, y, z) = p(x)p(y\x)p(z\y), 
that is, X and Z are conditionally independent of each other given Y. 

II. Cascade Source Coding with A Side information Vending Machine 

In this section, we first describe the system model for the cascade source coding problem 
with a side information vending machine of Fig. [3] We then present the characterization of the 
corresponding rate-distortion-cost performance in Sec. III-BL 
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Figure 4. Cascade source coding problem with a side information "vending machine" at Node 2 and Node 3. 



A. System Model 

The problem of cascade source coding of Fig. [31 is defined by the probability mass functions 
(pmfs) Pxy{x,v) and Pz\ay (z\a, u) an ^ discrete alphabets X, y, Z, A, Xi, X 2 , as follows. The 
source sequences X n and Y n with X n G X n and Y n G y n , respectively, are such that the 
pairs for i G [l,n] are independent and identically distributed (i.i.d.) with joint pmf 

Pxy(x,y). Node 1 measures sequences X n and Y n and encodes them in a message Mi of ni?! 
bits, which is delivered to Node 2. Node 2 estimates a sequence X" e X™ within given distortion 
requirements to be discussed below. Moreover, Node 2 maps the message M t received from Node 
1 and the locally available sequence Y n in a message M 2 of nR 2 bits, which is delivered to 
Node 3. Node 3 wishes to estimate a sequence X 2 £ X 2 within given distortion requirements. 
To this end, Node 3 receives message M 2 and based on this, it selects an action sequence A n , 
where A n G A n . The action sequence affects the quality of the measurement Z n of sequence 
Y n obtained at the Node 3. Specifically, given A 11 and Y n , the sequence Z n is distributed 
as p{z n \a n 1 y n ) = Y\4=iPz\A,Y{ z i\Vii a i)- The cost of the action sequence is defined by a cost 
function A: A — >[0, A max ] with < A max < oo, as A(a n ) = Ym=i M a i)- The estimated sequence 
X 2 with X 2 G X 2 is then obtained as a function of M 2 and Z n . The estimated sequences XJ for 
j = 1,2 must satisfy distortion constraints defined by functions dj(x,Xj): X x X 3 ; — > [0, D max \ 
with < -D max < oo for j = 1,2, respectively. A formal description of the operations at the 
encoder and the decoder follows. 

Definition 1. An (n, Ri, R 2 , Di, D 2 , T, e) code for the set-up of Fig. [3] consists of two source 
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Figure 5. Cascade-broadcast source coding problem with a side information "vending machine" at Node 2. 

encoders, namely 

gl : X n x y n ^ [l,2 nRl ], (1) 

which maps the sequences X n and Y n into a message Mi; 

g 2 : y n x [l,2 nRl ] ->■ [l,2 nR % (2) 

which maps the sequence Y n and message Mi into a message M 2 ; an "action" function 

t [l,2 nR2 ]^ A n , (3) 

which maps the message M 2 into an action sequence A n ; two decoders, namely 

hi. [l,2 nRl ] x y n (4) 

which maps the message Mi and the measured sequence Y n into the estimated sequence X™; 

h 2 : [l,2 nR2 \ x Z n ->■ # 2 n , (5) 

which maps the message M 2 and the measured sequence Z n into the the estimated sequence 
X%; such that the action cost constraint T and distortion constraints Dj for j — 1,2 are satisfied, 
i.e., 



1 ™ 

[A(A)] < r 

n i=i 

1 n 

and - J]E //.(.V,,,!!,,) < D j for j = 1, 2, 



(6) 
(7) 
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where we have defined as h^ and h 2i the ith symbol of the function h. x {Mi, Y n ) and h 2 (M 2 , Z n ), 
respectively. 

Definition 2. Given a distortion-cost tuple (D x , D 2 ,T), a rate tuple (Ri, R 2 ) is said to be 
achievable if, for any e > 0, and sufficiently large n, there exists a (n, R 2 , D^+e, D 2 +e, T+e) 
code. 

Definition 3. The rate-distortion-cost region H(Di, D 2 , T) is defined as the closure of all rate 
tuples (R 1 ,R 2 ) that are achievable given the distortion-cost tuple (D x , D 2 ,T). 

Remark 1. For side information Z available causally at Node 3, i.e., with decoding function 
© at Node 3 modified so that Xi is a function of M 2 and Z % only, the rate-distortion region 
H{D 1 ,D 2 ,V) has been derived in [0. 

In the rest of this section, for simplicity of notation, we drop the subscripts from the definition 
of the pmfs, thus identifying a pmf by its argument. 

B. Rate-Distortion-Cost Region 

In this section, a single-letter characterization of the rate-distortion-cost region is derived. 

Proposition 1. The rate-distortion-cost region TZ(D 1 , D 2 ,T) for the cascade source coding 
problem illustrated in Fig. \3\ is given by the union of all rate pairs (R\,R 2 ) that satisfy the 
conditions 



Rt > I{X)X U A,U\Y) 



(8a) 



and # 2 > I(X, Y; A) + I(X, Y; U\A, Z), 



(8b) 



where the mutual information terms are evaluated with respect to the joint pmf 



p(x, y, z, a, xi, u) = p{x, y)p{x 1 , a, u\x, y)p(z\y, a), 



(9) 



for some pmf p(xi, a, u\x , y) such that the inequalities 



E[d 1 {X,X 1 )} < D 



i- 



(10a) 



E[d 2 {X,f(U,Z))} < D. 



2- 



(10b) 



and E[A(A)} < T, 



(10c) 
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are satisfied for some function f: U x Z — ^ X 2 . Finally, U is an auxiliary random variable whose 
alphabet cardinality can be constrained as \U\ < \X\\y\\A\ + ?>, without loss of optimality. 

Remark 2. For side information Z independent of the action A given Y, i.e., for p(z\a,y) = 
p(z\y), the rate-distortion region 1Z(D 1 , D 2 , T) in Proposition Q] reduces to that derived in @|. 

The proof of the converse is provided in Appendix A for a more general case of adaptive 
action to be defined in Sec [IV} The achievability follows as a combination of the techniques 
proposed in |[TJ and flU Theorem 1]. Here we briefly outline the main ideas, since the technical 
details follow from standard arguments. For the scheme at hand, Node 1 first maps sequences X n 
and Y n into the action sequence A n using the standard joint typicality criterion. This mapping 
requires a codebook of rate /(X, Y; A) (see, e.g., lfT6l pp. 62-63]). Given the sequence A n , the 
sequences X n and Y n are further mapped into a sequence U n . This requires a codebook of size 
I(X, Y\ U\A) for each action sequence A n from standard rate-distortion considerations lfT6l pp. 
62-63]. Similarly, given the sequences A n and U n , the sequences X n and Y n are further mapped 
into the estimate X™ for Node 2 using a codebook of rate I(X, Y; Xi\U, A) for each codeword 
pair (U n ,A n ). The thus obtained codewords are then communicated to Node 2 and Node 3 as 
follows. By leveraging the side information Y n available at Node 2, conveying the codewords 
A n , U n and X™ to Node 2 requires rate /(X, Y; U, A) + /(X, Y; X X \U, A) - I(U, A, Xf, Y) by 
the Wyner-Ziv theorem [16, p. 280], which equals the right-hand side of (f8~al) . Then, sequences 
A n and U n are sent by Node 2 to Node 3, which requires a rate equal to the right-hand side of 
(f8bl) . This follows from the rates of the used codebooks and from the Wyner-Ziv theorem, due to 
the side information Z n available at Node 3 upon application of the action sequence A n . Finally, 
Node 3 produces X? that leverages through a symbol-by-symbol function as X 2 « = f(Ui, Zi) for 
% E [l,n\. 

C. Lossless Compression 

Suppose that the source sequence X" needs to be communicated losslessly at both Node 
2 and Node 3, in the sense that dj( the Hamming distortion measure for j = 1,2 

(dj(x, Xj) = if x = Xj and dj(x, Xj) — 1 if x ^ xf) and Di — D 2 — 0. We can establish the 
following immediate consequence of Proposition [Q 
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Corollary 1. The rate-distortion-cost region 1Z(0, 0, T) for the cascade source coding problem 
illustrated in Fig. \3\ with Hamming distortion metrics is given by the union of all rate pairs 
(Ri,R 2 ) that satisfy the conditions 

R 1 > I(X;A\Y) +H(X\A,Y) (11a) 
&ndR 2 > I(X,Y;A)+H(X\A,Z), (lib) 

where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, z, a) = p(x, y)p(a\x, y)p(z\y, a), (12) 
for some pmf p(a\x,y) such that E[A(A)] < T. 

III. Cascade-Broadcast Source Coding with A Side Information Vending 

Machine 

In this section, the cascade-broadcast source coding problem with a side information vending 
machine illustrated in Fig. @] is studied. At first, the rate-cost performance is characterized 
for the special case in which the reproductions at Node 2 and Node 3 are constrained to be 
lossless. Then, the lossy version of the problem is considered in Sec. IIII-DL with an additional 
common reconstruction requirement in the sense of [13] and assuming degradedness of the side 
information sequences. 

A. System Model 

In this section, we describe the general system model for the cascade-broadcast source coding 
problem with a side information vending machine. We emphasize that, unlike the setup of Fig. [3] 
here, the vending machine is at both Node 2 and Node 3. Moreover, we assume that an additional 
broadcast link of rate R b is available that is received by Node 2 and 3 to enable both Node 2 
and Node 3 so as to take concerted actions in order to affect the side information sequences. We 
assume the action sequence taken by Node 2 and Node 3 to be a function of only the broadcast 
message M b sent over the broadcast link of rate R b . 

The problem is defined by thepmfs px(x), Pyz\ax(v, z\a,x) and discrete alphabets X, y, Z, A, 
X\,X 2 , as follows. The source sequence X n with X n G X n is i.i.d. with pmf px(x). Node 
1 measures sequence X n and encodes it into messages Mi and M b of nR\ and nR b bits, 
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respectively, which are delivered to Node 2. Moreover, message M b is broadcast also to Node 3. 
Node 2 estimates a sequence X™ E X™ and Node 3 estimates a sequence X 2 e ^2™- To this end, 
Node 2 receives messages M\ and M b and, based only on the latter message, it selects an action 
sequence A n , where A n E A n . Node 2 maps messages Mi and M b , received from Node 1, and 
the locally available sequence Y n in a message M 2 of nR 2 bits, which is delivered to Node 3. 
Node 3 receives messages M 2 and M b and based only on the latter message, it selects an action 
sequence A n , where A 11 E A 11 . Given A n and X n , the sequences Y n and Z n are distributed as 
p(y n , z n \a n , x n ) = YYi=iPYZ\A,x(y-h Zi\ai, Xi). The cost of the action sequence is defined as in 
previous section. A formal description of the operations at encoder and decoder follows. 

Definition 4. An (n, Ri, R 2 , R b , -Di, D 2 , T, e) code for the set-up of Fig. \5\ consists of two source 
encoders, namely 

gl : X n ->[l,2 nRl }x[l,2 nRb ], (13) 

which maps the sequence X n into messages Mi and M b , respectively; 

g 2 : [l,2 nRl ] x [1,2*^] xf 4 [l,2 nR2 ] (14) 

which maps the sequence Y n and messages (M 1; M b ) into a message M 2 ; an "action" function 

t [l,2 nRb ] -> A n , (15) 

which maps the message M b into an action sequence A n ; two decoders, namely 

hi: [1, 2 nRl ] x [1, 2 nRb ] xf 4 X?, (16) 

which maps messages Mi and M b and the measured sequence Y n into the estimated sequence 
X"; and 

h 2 : [1, 2 ni?2 ] x [1, 2 nRt ] x Z n -> i 2 n , (17) 

which maps the messages M 2 and M 6 into the the estimated sequence X^; such that the action 
cost constraint © and distortion constraint © are satisfied. 

Achievable rates (Ri, R 2 , R b ) and rate-distortion-cost region are defined analogously to Defi- 
nitions [2] and [3] 

The rate-distortion-cost region for the system model described above is open even for the 
case without VM at Node 2 and Node 3 (see [3]). Hence, in the following subsections, we 
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characterize the rate region for a few special cases. As in the previous section, subscripts are 
dropped from the pmf for simplicity of notation. 

B. Lossless Compression 

In this section, a single-letter characterization of the rate-cost region 11(0, 0, T) is derived for 
the special case in which the distortion metrics are assumed to be Hamming and the distortion 
constraints are Z?i = and D 2 = 0. 

Proposition 2. The rate-cost region 11(0, 0, T) for the cascade-broadcast source coding problem 
illustrated in Fig. |?] with Hamming distortion metrics is given by the union of all rate triples 
(Ri, R 2 , Rb) that satisfy the conditions 

R b > I(X;A) (18a) 
Ri + Rb > I(X;A) + H(X\A,Y) (18b) 
&n<\R 2 + R b > I(X;A) + H(X\A,Z) (18c) 

where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, z, a) = p(x, a)p(y, z\a, x), (19) 
for some pmf p(a\x) such that E\h.(A)\ < V. 

Remark 3. If R\ = and R 2 = 0, the rate-cost region 1Z(T) of Proposition [2] reduces to the one 
derived in [5, Theorem 1]. 

Remark 4. The rate region (fl"8l) also describes the rate-distortion region under the more restrictive 
requirement of lossless reconstruction in the sense of the probabilities of error Pr[X™ ^ X^] < e 
for j = 1,2, as it follows from standard arguments (see HH Sec. 3.6.4]). A similar conclusion 
applies for Corollary Q] 

The converse proof for bound (|18al) follows immediately since A 11 is selected only as a function 
of message M b . As for the other two bounds, namely (I18bl) - (ll8cl) . the proof of the converse 
can be established following cut-set arguments and using the point-to-point result of HI. For 
achievability, we use the code structure proposed in 0]| along with rate splitting. Specifically, 
Node 1 first maps sequence X n into the action sequence A n . This mapping requires a codebook 
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of rate I(X; A). This rate has to be conveyed over link R b by the definition of the problem and is 
thus received by both Node 2 and Node 3. Given the so obtained sequence A n , communicating 
X losslessly to Node 2 requires rate H(X\A,Y). We split this rate into two rates r lb and r ld , 
such that the message corresponding to the first rate is carried over the broadcast link of rate 
R b and the second on the direct link of rate R\. Note that Node 2 can thus recover sequence 
X losslessly. The rate H(X\A,Z) which is required to send X losslessly to Node 3, is then 
split into two parts, of rates r 2b and r 2 d. The message corresponding to the rate r 2b is sent to 
Node 3 on the broadcast link of the rate R b by Node 1, while the message of rate r 2 d is sent by 
Node 2 to Node 3. This way, Node 1 and Node 2 cooperate to transmit X to Node 3. As per 
the discussion above, the following inequalities have to be satisfied 



r 2 b + r 2d + r lb 


> 


H{X\A,Z), 


rib + ru 


> 


H{X\A,Y), 


Ri 


> 


rid, 


R2 


> 


r 2 d, 


and R b 


> 


rib + r 2b + I{X; A) 



Applying Fourier-Motzkin elimination lfT6l Appendix C] to the inequalities above, the inequalities 
in (fl"8l) are obtained. 

C. Example: Switching-Dependent Side Information 

We now consider the special case of the model in Fig. H] in which the actions A 6 A = 
{0, 1, 2, 3} acts a switch that decides whether Node 2, Node 3 or either node gets to observe a 
side information W. The side information W is jointly distributed with source X according to the 
joint pmf p(x, w). Moreover, defining as e an "erasure" symbol, the conditional pmf p(y, z\x, a) 
is as follows: Y = Z = e for A = (neither Node 2 nor Node 3 observes the side information 
W); Y = W and Z = e for A = 1 (only Node 2 observes the side information W); Y = e 
and Z = W for A = 2 (only Node 3 observes the side information W); and Y = Z = W for 
A = 3 (both nodes observe the side information W^U We also select the cost function such that 

'This implies that p(y, z\x, a) = '^2p(w\x)5(y — w)S(z — e) for a = 1 and similarly for other values of a. 
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A(j) = Xj for j G A. When R 1 = R 2 = 0, this model reduces to the ones studied in [5, Sec. 
III]. The following is a consequence of Proposition 2. 

Corollary 2. For the setting of switching-dependent side information described above, the rate- 
cost region is given by 

R b > I(X;A) (20a) 

Ri + Rb > H(X) - Pl I(X;W\A= 1) - p 3 I(X;W\A = 3) (20b) 

and R 2 + R b > H(X) - p 2 I(X; W\A = 2) - p 3 I(X; W\A = 3) (20c) 

where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, z, a) = p(x, a)p(y, z\a, x), (21) 

for some pmf p(a\x) such that Y^ =0 PjXj — T, where we have denoted pj = Pt[A = j] for 
3 e A. 

Proof: The region (|20l ) is obtained from the rate-cost region (TT81) by noting that in (|18b| ) 
we have /(X; A) + H(X\A, Y) = H(X) - /(X; Y\A) and similarly for dUcJ). ■ 

In the following, we will elaborate upon two specific instances of the switching-dependent 
side information example. 

Binary Symmetric Channel (BSC) between X and W: Let (X, W) be binary and symmetric 
so that p(x) = p(w) = 1/2 for x, w G {0, 1} and Pr[X ^ W) = 5 for 5 G [0, 1]. Moreover, let 
Xj = oo for j = 0,3 and = 1 otherwise. We set the action cost constraint to T — 1. Note 
that, given this definition of A (a), at each time, Node 1 can choose whether to provide the side 
information W to Node 2 or to Node 3 with no further constraints. By symmetry, it can be 
seen that we can set the pmf p(a\x) with x G {0, 1} and a G {1, 2} to be a BSC with transition 
probability q. This implies that pi = Pr[A = 1} = q and p 2 = Pr[A = 2] = 1 — q. We now evaluate 
the inequality (l20al) as R b > 0; inequality (I20B1) as R 1 + R b > l-pJ(X; W\A = 1) = l-qH{5); 
and similarly inequality (I18cl) as R 2 +R b > l — (l—q)H(5). From these inequalities, it can be seen 
that, in order to trace the boundary of the rate-cost region, in general, one needs to consider all 
values of q in the interval [0, 1]. This corresponds to appropriate time-sharing between providing 
side information to Node 2 (for a fraction of time q) and Node 3 (for the remaining fraction of 
time). Note that, as shown in [|5l Sec. Ill], if R\ = R 2 = 0, it is optimal to set q = \, and thus 

July 13, 2012 DRAFT 



16 




Figure 6. The side information S-channel p(w\x) used in the example of Sec. IIII-Cl 



equally share the side information between Node 2 and Node 3, in order to minimize the rate 
Rb. This difference is due to the fact that in the cascade model at hand, it can be advantageous 
to provide more side information to one of the two encoders depending on the desired trade-off 
between the rates i? x and R 2 in the achievable rate-cost region. 

S-Channel between X and W: We now consider the special case of Corollary [2] in which 
(X, W) are jointly distributed so that p(x) = 1/2 and p(w\x) is the S-channel characterized by 
p(0|0) = 1 — 5 and p(l|l) = 1 (see Fig. [6]). Moreover, we let Ai = 1, A 2 = 0, A = A 3 = oo as 
above, while the cost constraint is set to T < 1. As discussed in [|51 Sec. Ill] for this example 
with Ri = R 2 = 0, providing side information to Node 2 is more costly and thus should be 
done efficiently. In particular, given Fig. [6l it is expected that biasing the choice A = 2 when 
X = 1 (i.e., providing side information to Node 2) may lead to some gain (see flU). Here we 
show that in the cascade model, this gain depends on the relative importance of rates Ri and 
R 2 . 

To this end, we set p(a\x) as p(l|0) = a and p(l|l) = /3 for a, (3 G [0, 1]. We now evaluate 
the inequality (120al) as R b > 0; inequality (120bl) as 

and inequality (120cl) as 

We now evaluate the minimum weighted sum-rate Ri + r]R 2 obtained from (I22l)-(l23l) for 
Rb = 0.4, 5 = 0.6 and both T = 0.1 and T = 0.9. Parameter r] > rules on the relative 
importance of the two rates. For comparison, we also compute the performance attainable by 
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Figure 7. Difference between the weighted sum-rate Ri + r\R-i obtained with the greedy and with the optimal strategy as per 
Corollary |2](7? 6 = 0.4, 8 = 0.6). 

imposing that the action A be selected independent of X, which we refer to as the greedy 
approach [1J. Fig. [7] plots the difference between the two weighted sum-rates R\ +r]R 2 . It can 
be seen that, as 77 decreases and thus minimizing rate R\ to Node 2 becomes more important, 
one can achieve larger gains by choosing the action A to be dependent on X. Moreover, this 
gain is more significant when the action cost budget T allows Node 2 to collect a larger fraction 
of the side information samples. 

D. Lossy Compression with Common Reconstruction Constraint 

In this section, we turn to the problem of characterizing the rate-distortion-cost region TZ(Di, D 2 
, T) for Di,D 2 > 0. In order to make the problem tractable |^, we impose the degradedness 
condition X — (A, Y) — Z (as in (51), which implies the factorization 




(24) 



2 As noted earlier, the problem is open even in the case with no VM (5). 
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and that the reconstructions at Nodes 2 and 3 be reproducible by Node 1. As discussed, this 
latter condition is referred to as the CR constraint [fT3l . Note that this constraint is automatically 
satisfied in the lossless case. To be more specific, an (n, R\, R 2 , Rb, D 1 , D 2 , T, e) code is defined 
per Definition |4] with the difference that there are two additional functions for the encoder, 
namely 

■01 : X n — »■ X? (25a) 
and i) 2 : X n ->■ X?, (25b) 

which map the source sequence into the estimated sequences at the encoder, namely ipi(X n ) 
and ip 2 (X n ), respectively; and the CR requirements are imposed, i.e., 

PrfVipr 1 ) ^h x (M l5 M b ,Y n )} < e (26a) 
and Pr \^ 2 {X n ) ^ h 2 (M 2 , M 6 , Z n )] < e, (26b) 

so that the encoder's estimates and ^ 2 (") are equal to the decoders' estimates (cf. (fT6l)-(fT7l)) 
with high probability. 

Proposition 3. The rate-distortion region TZ(Di, D 2 , T) for the cascade-broadcast source coding 
problem illustrated in Fig. |?] under the CR constraint and the degradedness condition ([2?]) is 



given by the union of all rate triples (R\, R 2 , Rb) that satisfy the conditions 

R b > I{X;A) (27a) 

Ri + Rb > I{X;A) + I(X;X 1 ,X 2 \A,Y) (27b) 

R 2 + R b > I{X;A) + I(X;X 2 \A,Z) (27c) 

andR 1 + R 2 + R b > I(X;A)+I(X;X 2 \A,Z)+I(X',X 1 \A,Y,X 2 ), (27d) 
where the mutual information terms are evaluated with respect to the joint pmf 

p(x, y, z, a, Xi,x 2 ) = p(x)p(a\x)p(y\x , a)p(z\a, y)p(xi, x 2 \x, a), (28) 
such that the inequalities 

EldjfaXj)] < Dj, for j = 1, 2, (29a) 

and E[A(A)] < T, (29b) 
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are satisfied. 

Remark 5. If either Ri = or R b = and the side information Y is independent of the action 
A given X, i.e., p(y\a,x) = p{y\x), the rate-distortion region 1Z(Di, D 2 ,T) of Proposition [3] 
reduces to the one derived in |[T2l Proposition 10]. 

The proof of the converse is provided in Appendix B. The achievability follows similar to 
Proposition [2l Specifically, Node 1 first maps sequence X n into the action sequence A n . This 
mapping requires a codebook of rate I(X; A). This rate has to be conveyed over link R b by the 
definition of the problem and is thus received by both Node 2 and Node 3. The source sequence 
X n is mapped into the estimate X 2 for Node 3 using a codebook of rate 1{X\ X 2 \A) for each 
sequence A n . Communicating X^ to Node 2 requires rate I(X; X 2 \A,Y) by the Wyner-Ziv 
theorem. We split this rate into two rates r 2b and r 2d , such that the message corresponding to the 
first rate is carried over the broadcast link of rate Rb and the second on the direct link of rate 
R\. Note that Node 2 can thus recover sequence X 2 . Communicating X 2 to Node 3 requires 
rate I(X; X 2 \A, Z) by the Wyner-Ziv theorem. We split this rate into two rates r ob and r od . The 
message corresponding to the rate rob is send to Node 3 on the broadcast link of the rate R b 
by Node 1, while the message of rate r od is sent by Node 2 to Node 3. This way, Node 1 and 
Node 2 cooperate to transmit X 2 to Node 3. Finally, the source sequence X n is mapped by 
Node 1 into the estimate X™ for Node 2 using a codebook of rate I{X\ Xi\A, X 2 ) for each pair 
of sequences (A™,^ 1 ). Using the Wyner-Ziv coding, this rate is reduced to I(X; X±\A, Y, X 2 ) 
and split into two rates r lb and r ld , which are sent through links R b and R\, respectively. As 
per discussion above, the following inequalities have to be satisfied 



r ob + r od + r 2b > I(X;X 2 \A,Z) 



r 2b + r 2d > I(X;X 2 \A,Y), 



rib + r ld 



> 



I{X ] X 1 \A,Y,X 2 ), 



Ri 



> 



rid + r 2dl 



R2 



> 



rod, 



and R b 



> 



rib + r 2b + r ob + I(X; A), 
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Applying Fourier-Motzkin elimination |fT6l Appendix C] to the inequalities above, the inequalities 
in (ITTT) are obtained. 

IV. Adaptive Actions 

In this section, we assume that actions taken by the nodes are not only a function of the 
message M 2 for the model of Fig. [3] or M b for the models of Fig. @] and Fig. [51 respectively, but 
also a function of the past observed side information samples. Following [fT5ll . we refer to this 
case as the one with adaptive actions. Note that for the cascade-broadcast problem, we consider 
the model in Fig. [51 which differs from the one in Fig. @] considered thus far in that the side 
information Z is not available at Node 3. At this time, it appears to be problematic to define 
adaptive actions in the presence of two nodes that observe different side information sequences. 
For the cascade model in Fig. [31 a (n, R x , R 2 , D 1 , D 2 , T) code is defined per Definition \T\ with 
the difference that the action encoder ([3]) is modified to be 

t [1,2" R2 ] x Z}- x ->A, (30) 

which maps the message M 2 and the past observed decoder side information sequence Z 1 ^ 1 into 
the ith symbol of the action sequence A { . Moreover, for the cascade-broadcast model of Fig. [51 
the "action" function (TT51) in Definition [4] is modified as 

t [l,2 nRb ] x y- 1 ->A, (31) 

which maps the message M b and the past observed decoder side information sequence Y 1 ^ 1 into 
the ith symbol of the action sequence A4. 

Proposition 4. The rate-distortion-cost region TZ(Di, D 2 , T) for the cascade source coding 
problem illustrated in Fig. \3\ with adaptive action-dependent side information is given by the 
rate region described in Proposition [7] 

Proposition 5. The rate-distortion-cost region 1Z(Di, D 2 ,T) for the cascade-broadcast source 
coding problem under the CR illustrated in Fig. \5\ with adaptive action-dependent side informa- 
tion is given by the region described in Proposition \3\ by setting Z — 0. 

Remark 6. The results above show that enabling adaptive actions does not increase the achievable 
rate-distortion-cost region. These results generalize the observations in [[T51 for the point-to-point 
setting, wherein a similar conclusion is drawn. 
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To establish the propositions above, we only need to prove the converse. The proofs for 
Proposition @] and Proposition \5\ are given in Appendix A and B, respectively. 

V. Concluding Remarks 

In an increasing number of applications, communication networks are expected to be able to 
convey not only data, but also information about control for actuation over multiple hops. In 
this work, we have tackled the analysis of a baseline communication model with three nodes 
connected in a cascade with the possible presence of an additional broadcast link. We have 
characterized the optimal trade-off between rate, distortion and cost for actuation in a number 
of relevant cases of interest. In general, the results point to the advantages of leveraging a 
joint representation of data and control information in order to utilize in the most efficient way 
the available communication links. Specifically, in all the considered models, a layered coding 
strategy, possibly coupled with rate splitting, has been proved to be optimal. This strategy is 
such that the base layer has the double role of guiding the actions of the downstream nodes and 
of providing a coarse description of the source, similar to [1J. Moreover, it is shown that this 
base compression layer should be designed in a way that depends on the network topology and 
on the relative cost of activating the different links. 

VI. ACKNOWLEDGMENTS 

The work of O. Simeone is supported by the U.S. National Science Foundation under grant 
CCF-09 14899, and the work of U. Mitra by ONR N000 14-09- 1-0700, NSF CCF-09 17343 and 
DOT CA-26-7084-00. 

Appendix A: Converse Proof for Proposition [Hand 0] 

Here, we prove the converse part of Proposition HI Since the setting of Proposition \T\ is more 
restrictive, as it does not allow for adaptive actions, the converse proof for Proposition Q] follows 
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immediately. For any (n, Ri, R 2 , D 1 + e, D 2 + e,T + e) code, we have 

> H{M{) 

> #(Mi|y n ) 

( = } J(M i; X n ,Z n |F n ) 

= H(X n , Z n \Y n ) - #(X n , Z n \M x , Y n ) 

= H{X n \Y n ) + H{Z n \X n , Y n ) - H{Z n \Y n , M 1 ) - H{X n \Z n , Y n , Mi) 

( = } #(X n |F n ) + #(Z n |X n , F n , Mi, M 2 ) - #(Z n |F n , Mi, M 2 ) - #(X n |Z n , F™, M 1? M 2 ) 

= H(X n \Y n ) - H(X n \Z n ,Y n ,M u M 2 ,A n ,Xl l ) 

n 

+ H{Zi\Z^\X n , Y n , Mi, M 2 ) - if^lZ*" 1 , F n , Mi, M 2 ) 
i=i 

{C > ^(HiXilYi) - H(X i \X i -\Y i ,M 2 ,A i ,Z n ,X li )) 
i=i 

+ ^ if (Z^" 1 , X™, F n , Mi, M 2 , Aj) - if (Z^" 1 , F n , Mi, M 2 , ^) 
i=i 

n 

{ = } ]T /(X,; X H , A*, Ui\Yj + H(Zi\Yi, A,) - H(Z i \Y i , A t ) 
i=i 

n 

W), (32) 

i=i 

where (a) follows since Mi is a function of (X n ,Y n ); (b) follows since M 2 is a function 
of (Mi,F n ); (c) follows since is a function of (M 2 , Z' 1 " 1 ) and since X™ is a function of 
(Mi,F n ); (d) follows since conditioning decreases entropy and since X n and Y n are i.i.d.; and 
(e) follows by defining Ui = (M 2 , A"*" 1 , A*" 1 , Z n \*) and since X n , Y n \\ M u M 2 )— 

(Ai, Yj) — Zi form a Markov chain by construction. We also have 

nR 2 > H(M 2 ) 

= I{M 2 ;X n ,Y n ,Z n ) 
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= H(X n ,Y n ,Z n ) - H(X n ,Y n ,Z n \M 2 ) 

= H(X n , Y n ) + H(Z n \X n , Y n ) - H(Z n \M 2 ) - H(X n , Y n \M 2 , Z n ) 

n 

= J2 H ( Xi > Y ^ + H ( Z i\ zi '^ X "> yn ) - H { z i\ zi ~\ M 2) 
i=i 

-H{X i ,Y i \X i -\Y i -\M 2 ,Z n ) 

n 

( = } £\ff (X is Y^+HiZ.lZ 1 - 1 , X n , Y n , M 2 , A^-H^Z*- 1 , M 2 , A,) 

-H{X U Y t \X l -\ Y { ~\ M 2 , Z», A 1 ) 

> J2 H ^ Yi ) + H ( z i\ x h Y ^ A *) - H ( z i\ A i) ~ H ( x h Yi\U h A, (33) 
i=i 

where (a) follows because M 2 is a function of (Mi, y n ) and thus of (X n , Y n ) and because A 1 is 
a function of (M 2 , and (b) follows since conditioning decreases entropy, since the Markov 

chain relationship Zi— (X h Y h Afi— (X n \\Y n \\M 2 ) holds and by using the definition of Ui. 

Defining Q to be a random variable uniformly distributed over [1, n) and independent of all the 
other random variables and with X — Xq, Y — Yq, Z — Zq, A — Aq, X\ — X±q, X 2 — X 2 q 
and U — (Uq, Q), from (|32|) we have 

(a) 

nR x > I(X; X u A, U\Y, Q) > H(X\Y) - H{X\X U A, U, Y) = I(X; X x , A, U\Y), 

where in (a) we have used the fact that (X n , Y n ) are i.i.d and conditioning reduces entropy. 
Moreover, from (1331) we have 

nR 2 > H{X,Y\Q) + H(Z\X,Y,A,Q)-H{Z\A,Q)-H(X,Y\U,A,Z,Q) 

(a) 

> H(XY) + H(Z\X, Y, A) - H(Z\A) - H(X, Y\U, A, Z) 

= I(XY-,U,A,Z)-I(Z;X,Y\A) 

= I(XY;A) + I{X,Y;U\A,Z), 

where (a) follows since (X n , Y n ) are i.i.d, since conditioning decreases entropy, by the definition 
of U and by the problem definition. We note that the defined random variables factorize as © 
since we have the Markov chain relationship X — (A, Y) — Z by the problem definition and 
that X 2 is a function f(U, Z) of U and Z by the definition of U. Moreover, from the cost and 
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distortion constraints ©-©, we have 

1 n 

Dj + e>- jyidjiXi, Xji)] = E[dj(X, Xj)}, for j = 1, 2, (34a) 

2=1 
1 - 

and T + e > - V E [A (A,)] = E \A(A)} . (34b) 

i=l 

To bound the cardinality of auxiliary random variable U, we fix p(z\y, a) and factorize the 

joint pmf p(x, y, z, a, u, Xi) as 

p(x,y,z,a,u,xi) = p(u)p(x 1 ,a,x,y\u)p(z\y,a). 

Therefore, for fixed p(z\y,a), the quantities (|8al) - (ll0cl) can be expressed in terms of integrals 
given by f gj(p(xi,a,x,y\u))dF(u), for j = 1,..., + 3, of functions gj(-) that are 

continuous on the space of probabilities over alphabet |Af|x|3^|x|^.|x|^i|. Specifically, we have 
gj for j = 1, 1^113^11.4.1 — 1, given by the pmf p(a, x, y) for all values of x E X, y E y and 
a E A, (except one), g\x\\y\\A\ = H{X\A,Y,X 1 ,U = u), = H(X,Y\A,Z,U = u), 

and g\x\\y\\A\+i+j — E[dj(X,Xj)\U = u], for j = 1, 2. The proof in concluded by invoking the 
Fenchel-Eggleston-Caratheodory theorem lfT6l Appendix C]. 

Appendix B: Proof of Proposition [3] 

Here, we prove the converse parts of Proposition [3] and Proposition |5l We start by proving 
Proposition [31 The proof of Proposition \5\ will follow by setting Z = 0, and noting that in the 
proof below the action Aj can be made to be a function of Y 1 ' 1 , in addition to being a function 
of Mb, without modifying any steps of the proof. By the CR requirements (|26l) . we first observe 
that for any (n, Ri,R 2 , Rb, D x + e, D 2 + e, T + e) code, we have the Fano inequalities 

H(MX n )MMuM b ,Y n )) < nS(e), (35a) 
and H(ij 2 (X n )\h 2 (M 2 ,Mb,Z n )) < nS{e), (35b) 

where 5(e) denotes any function such that 5(e) — > if e — >• 0. Next, we have 

nR b > H(M b ) 

= I(M b ; X n , Y n ) 
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= H(X n , Y n ) - H(X n , Y n \M b ) 

= H(X n ) + H(Y n \X n , M b ) - H (X n , Y n \M b ) 

n 

{ ^ H W + H(Y t \Y^\ X n , M b , A,) - H{X h Y^ 1 , Y^\ M b , A,) 
i=i 

n 

= ^ H(Xi) + H(Y i \Y i -\X n , M b , At) - H^X*- 1 , Y' l ~\ M b , A t ) 

i=l 

-HiYilX^Y'-^M^At) 

n 

{ = } H ( x i) + H{Yi\Xi, Ai) - H{X i \X i -\ Y*- 1 , M b , A,) - H(Y i \X i , A,) 

t=i 
id) " 

> J2 H ( X i)- H ( x Mi), (36) 

i=i 

where (a) follows since M b is a function of X n ; (b) follows since A^ is a function of M b and 
since X n is i.i.d.; (c) follows since (Y 1 * 1 , X n \ % , M b ) — (Ai,Xi) — Yi forms a Markov chain by 
problem definition; and (d) follows conditioning reduces entropy. In the following, for simplicity 
of notation, we write hi, h 2 , ipi,ip2 for the values of corresponding functions in Sec. IIII-DL Next, 
We can also write 

n(i?i + i4) > H(M 1} M b ) 

( = } I(Mx,M b ;X n ,Y n ,Z n ) 

= H(X n ,Y n ,Z n ) - H(X n ,Y n ,Z n \M l ,M b ) 

= H(X n ) + H(Y n , Z n \X n ) - H(Y n , Z n \M u M b ) - H(X n \Y n , Z n ; M u M b ) 
= H{X n ) + H(Y n , Z n \X n , M b ) - H{Y n \M x , M b ) 

-H{Z n \M l) M b , Y n , A n ) - H {X n \Y n , Z n , M u M b , M 2 , A n ) 

n 

( = c) Y^H(Xi) + H(Yi, Z t \X t , Ai) - H(Y t \Y*-\ M u M b , A,) 

i=l 

-HiZilZ 1 - 1 , Mi, M b , Y n , A n ) - HiXilX'- 1 , Y n , Z n , M x , M b , A n , M 2 , hi, h 2 ) 

> ^HiXJ+HpilXi, A^ +H(Z l \Y t , Ai) - H{Y l \A l ) - H{Z l \Y u A t ) 
i=i 



-Hix^AMM) 
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J2l{Xi) Y h A u h 1} h 2 ) - I(Y i5 Xi\Ai) 
i=i 

n 

J^HXi] Yi, A, hi, h 2 , ^2) - IpQ; Vi> A, hi, h 2 ) - /(K<; Xi\At) 



i=l 



(e) 



> F i5 A i5 Vi, ^2) - V^, A, hi, h 2 ) 



i=l 



+H(ip 1 ,ip 2 \Yi,A i ,hi,h 2 ,X i ) - I(Yi]Xi\Ai) 
(/) " 

> y <s Aj, ^1, v> 2 ) - /(^ ; + ntf(e) 

1=1 



Aj) + I(X t ; ^1, V2I Y, Aj) + n<J(e), (37) 



i=l 



where (a) follows because (Mi,M b ) is a function of X n ; (b) follows because M b is a function 
of X n , A n is a function of M b and M 2 is a function of (Mi, (c) follows since 

iJ(F",Z"|X",M b ) = Y,ti H ( Y i> z i\ Y^\Z^-\X n ,M h ,A l ) = Zti H ( Y i, Z i\ X i, A i) and 
since h x and h 2 are functions of (M 1 ,M b ,Y n ) and (M 2 ,M 6 , Z 71 ), respectively and because 
(Y*, Zi)—(Xi, Ai)— (X n \\ Z* _1 , M fe ) forms a Markov chain; (d) follows since condition- 
ing reduces entropy, since side information VM follows p(y n , z n \d n , x n ) = YYi=iPy\Ax(yi\ a i: x i) 
Pz\A,Y{zi\a>hVi) from ([24]) and because Z; — (Y,Aj) — {Y n \\ Z l ~ l , M Xl M h ) forms a Markov 
chain; (e) follows by the chain rule for mutual information and the fact that mutual information 
is non-negative; and (/) follows by the Fano inequality (1351 ) and because entropy is non-negative. 
We can also write 

n{R 2 + R b ) > H(M 2 , M b ) 

( = } I(M 2 ,M b ;X n ,Y n ,Z n ) 

= H(X n ,Y n ,Z n ) - H(X n ,Y n ,Z n \M 2 ,M b ) 

= H(X n ) + H(Y n , Z n \X n , M b ) - H(Z n \M 2l M b ) - H(X n , Y n \Z n , M 2 , M b ) 



(J 

4=1 



J2 H{Xi) + (Y <s ZilY*- 1 , Z { -\ X n , M b , A { ) - HiZilZ'- 1 , M 2 , M fe , Aj 
H{Xi, YIX'' 1 , Y*- 1 , M 2 , M 6 , Z n , A^ 

n 

H(Xi, Y t ) - H(Yi\Xi) + H(Y, Z t \Y l -\ Z l -\X n , M b , A,) 

i=i 

HiZilZ*- 1 , M 2 , M 6 , A,) - KilX*" 1 , Y~\ M 2 , M 6 , Z n , A,) 
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n 

( = } H ( X *> Y >) - H(Y t \X t ) + H(Yi\Xi, Ai) + H(Z l \A l , Y h X t ) 
i=i 

- H(Z i \Z i ~ 1 , M 2 , M b , A^ - H(X h YilX'- 1 , Y l ~\ M 2 , M b , Z n , A,) 

n 

( = } ^H(X i ,Y i )-I(Y i -A i \X i ) + H(Z i \A i ,Y i ,X i )-H(Z i \Z i -\M 2 ,M b ,A i ) 
i=i 

- yjix*- 1 , y*- 1 , m 2 , m 6 , h 2 , z n , a^ 

(e) n 

> H ( X *> Y i) + J ( X ^ A i) - W, X ^ A i) + H(Zi\A h Yi, X t ) 
i=i 

- H(Zi\Ai) - H(Xi,Yi\h 2 ,Ai,Zi) 

n 

= I&i, Yi\ h 2 , A, Z h ^) - y ; ^ 2i |h 2 , A, ZA + I(X f , A) 
i=i 

-I(Yi,Xi;Ai)-I(Xi,Yi;Zi\Ai) 

n 

> J2 H x i, Yi] A h Z u fa) - H(^ 2i \h 2 , A, Zi) + H(^ 2i \h 2 , A, X t , Yi, Z { ) 
i=i 

+ I(Xi;Ai)-I(Xi,Yi;Zi,Ai) 
(/) n 

> ^7pQ;A) + I{Xi,Yi,i> 2i \Ai,Zi) + n5(e), (38) 
i=i 

where (a) follows since M b is a function of X™ and because M 2 is a function of (M 1 , M b , Y n ) and 
thus of (X n , Y n ); (b) follows since Ai is a function of M b and since X n is i.i.d.; (c) follows since 
(Y h Zi)—(Xi, A^— (X n \ l , Z 1 - 1 , M b ) forms a Markov chain and since p{y n , z n \a n ,x n ) 
= YYl=iPY\A,x{yi\ai,Xi)p z \A,Y{zi\ai,yi); (d) follows since h 2 is a function of (M 2 ,M b ,Z n ); (e) 
follows since conditioning reduces entropy; and (/) follows since entropy is non-negative and 
using the Fanos inequality. Moreover, with the definition M = (Mi, M 2 , M b ), we have the chain 
of inequalities 

n(i?i + R 2 + R b ) > H(M) 

= I(M; X n , Y n , Z n ) 

= H(X n , Y n , Z n ) - H(X n , Y n , Z n \M) 

= H(X n ) + H(Y n , Z n \X n , M b ) - H(X n , Y n , Z n \M) 
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= I{X n ; A n ) + H{Y n , Z n \X n , M b ) - H(Y n , Z n \M) 

- H(X n \Y n , Z n , M) + H{X n \A n ) 

= I(X n ; A n ) + H{Y n , Z n \X n , M b ) - H(Y T \ Z n \M) + I{X n ; Y n , Z n , M\A n ) 
= I(X n ; A n ) + I(M; X n \Y n , A n , Z n ) + H(Y n , Z n \X n , M b ) 

- H(Y n , Z n \M) + I{X n ; Y n , Z n \A n ) 

( = } H(X n ) - H(X n \A n ) + H {X n \Y n , A n , Z n ) - H(X n \Y n , A n , Z n , M) 

- H(Y n , Z n \M) + H(Y n , Z n \A n ) 

= H(X n ) - H(X n \A n ) + H{X n , Y n , Z n \A n ) - H{X n \Y n , A n , Z n , M) 

- H(Y n , Z n \M) 

= H{X n ) + H(Y n , Z n \A n , X n ) - H{X n \Y n , A n , Z n , M) - H(Y n , Z n \M) 

n 

( = } Y, H ( X i) + H{Y,\A h X,) + H(Z i \A i , Yj) - H{X t \X l -\ Y n , A n , Z n , M) 
i=i 

- HiZilZ*- 1 , M, A t ) - H(Y t \Y l ~\ Z n , M, A,) 

n 

( = } Y HiXA+HfrlA, X i )+H{Z i \A i , Y i )—H(X i \X i ~ 1 ,Y n , A n , Z n , M, h 1; h 2 ) 

i=i 

- H{Z t \Z*-\ M, Ai) - H(Y t \Y l -\ Z n , M, A t ,h 2 ) 

n 

> Y H ( x i) + H{Y,\A h Xi) + H(Zi\A h Yi) - H{Xi\Y u A h hi, h 2 ) 

-E{Z i \A i )-Eiy i \Z h A i M) 

> 7pQ; Ai, Y h + H(Y t \A, XA + H{Z i \A i , Y { ) 

- H(Zi\Ai) - H(Yi\Zi, A, 4j 2 ) - n5(e), (39) 

where (a) follows since (Mi, M b ) is a function of X n and M 2 is a function of (Mi, M b , Y n ); (b) 
follows since H(Y n , Z n \X n , M b ) = J2ti H ( Y h Z i\ Y'~ l , Z'~ l , X n , M b , A t ) = J2ti H ( Y i, Z i\ 
X u Ai) = H(Y n , Z n \X n , A n ); (c) follows since A { is a function of M b \ (d) follows since hi, h 2 
are functions of (M, F n ) and (M, Z n ), respectively; and (e) follows since entropy is non-negative 
and by Fano's inequality. Next, from (1391 ) we have 
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n(R 1 + R 2 + R b ) > I(X t ; A h Yi,fa,fa) + H(Yi\A,Xi) + H^A^Y,) - H{Z l \A l ) 

- H{Y h Zi\Ai, fa) + H{Z i \A il fa) - n5(e) 

= I(X t ; A, Yi, fa, fa) + H(Y t \A, X,) - H(Zi\Ai) - H(Y\A, fa) 
+ H(Z i \A i ,fa)-nd(e) 

= I(X f , A h Y h fa, fa) - I(X t ; Yi\A h fa) - I(Zi] fa\A) - n5(e) 

( = } I(X t ; Ai, Yi, fa, fa) - I(Xi- Yi\Ai, fa) - I{Yi, Ai\Xi) - J(Z <; Y^A t ) 

+ I(Yi- A, fa\X t ) + I{Zi, Y\fa, A,) - n5(e) 

( = } /(X,; A h Y, fa, fa) - I(X i; Yi\A h fa) + /(X,; A t ) - I(Y h X { ; Ai) 

- I{Zi, Xi, Y t \A t )+I(X t , Yi, A, fa)+I(Zi, Xi, Yi\fa, A t ) -I{X t ; A, fa)-n8(e) 
= I{Xi, A,) + I{Xi, Ai, Y, fa, fa) + I(Xi, Y t ; A, fa, Z t ) - I(A, Z f , X t , YJ 
-I(Xi;Yi,Ai,fa)-n5(e) 

= I{Xi, Ai)+I{Xi; A h Y h fa, fa)+I(Xi, Y; fa\A, Y, A, fa)-nS(e) 

= I{Xi, Ai) + I(Xi, Y; fa\A, Zi) + I(Xi- fa\A h Y, fa) - n8(e), (40) 

where (a) is true since 

I{Yi, Ai\Xi) + I{Z { , Y\Ai) - I(Y f , A, fa\Xi) - I{Zi, Y t \fa, A,) 

= H(Yi\Xi) - H(Yi\Xi, A^ + H(Zi\Ai) - E{Z % \A, Y t ) - H(Y t \X t ) + H(Y\X U A,) 

-H{Zi\fa,A)+H{Zi\Ai,Yi) 

= H(Z l \A l )-H(Z l \fa,A l ); 

(b) follows because I(Z i; X h Y { \A) = I{Zi;Yi\A) and I(Z i; X^A^fa) = I^-Y^fa). 

Next, define X^ = ipji(X n ) for j = 1,2 and i = 1, 2, n and let Q be a random variable 
uniformly distributed over [l,n] and independent of all the other random variables and with 
X = X Q , Y = Y q , A = A q , from ([36]), we have 

(a) 

nR b > H(X\Q) - H(X\A, Q) > H(X) - H(X\A) = I(X; A), 
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where (a) follows since X™ is i.i.d. and since conditioning decreases entropy. Next, from (1371 ), 
we have 

niRi + Rt) > I{X-A\Q)+I{X-X U X 2 %A,Q) 
> I(X;A)+I(X;X l ,X 2 \Y,A), 

where (a) follows since X n is i.i.d., since conditioning decreases entropy and by the problem 
definition. From (1381) . we also have 

n{R 2 + R b ) > I{X;A\Q) + I{X,Y;X 2 \A,Z,Q) 

> I(X; A) + H(X, Y\A, Z, Q) - H(X, Y\A, Z, X 2 ) 

= J(X; A) + H(Y\A, Z) + H(X\A, Y, Z) - H(X, Y\A, Z, X 2 ) 
= I{X;A) + I{X,Y;X 2 \A,Z) 

> I{X;A) + I{X;X 2 \A,Z) 

where (a) follows since X n is i.i.d. and by conditioning reduces entropy; and (b) follows by the 
problem definition. Finally, from (|40l) , we have 

n(R 1 + R 2 + R b ) > I(X, A\Q) + /(X, Y; X 2 \A, Z, Q) + /(X; X,\A, F, X 2 , Q) 

> I{X, A) + H(X, Y\A, Z, Q) - H(X, Y\A, Z, X 2 ) + I{X- X,\A, Y, X 2 ) 

( = } /(X; A)+H(Y\A, Z)+H(X\A, Y, Z)-H(X, Y\A, Z, X 2 )+7(X; X t \A, Y, X 2 ) 
= J(X; A) + J(X, Y; X 2 \A, Z) + J(X; X t \A, Y, X 2 ) 

> J(X; A) + 7(X; X 2 \A, Z) + J(X; X X \A, Y, X 2 ) (41) 

where (a) follows since X n is i.i.d, since conditioning decreases entropy, and by the problem 
definition; and (b) follows by the problem definition. From cost constraint ©, we have 

T + e > - f> [A(A 4 )] = E [A(A)] . (42) 
n i=i 

Moreover, let B be the event B = {Oi(X n ) ^ h x (M u M b ,Y n )) A (^ 2 (X n ) ^ h 2 {M 2 , M b ))}. 
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Using the CR requirement (|26l) , we have Pr(£>) < e. For j = 1, 2, we have 



d(Xj,Xj 



i=l 

1 n I 



71 t=l 



< -^E^X^/^+eD, 

n i=i 

(c) 



(43) 



where (a) follows using the fact that Pr(i3) < e and that the distortion is upper bounded by 
D max ; (b) follows by the definition of Xji and £>; and (c) follows by ©. 
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